ADAPTIVE PREDICTIVE PLAYOUT SCHEME FOR PACKET VOICE 
APPLICATIONS 



Field of Invention 

5 This invention relates to voice packet playout schemes and in particular to an 

Adaptive Predictive Playout Scheme for Packet Voice Applications. 

Background of the Invention 

The revolution in high-speed communication networks, an example of which is ihe 

10 Internet, has given rise to the potential for enabling the deployment of multimedia 
applications. These applications, however, require stringent quality of service (QoS) 
guarantees, such as bounded delay and jitter. The current Internet was originally designed 
to offer best effort service without any QoS guarantees. In such a packet switching 
environment, the delay of each packet varies greatly due to the complexities of "lie 

15 network traffic and to the traffic scheduling algorithms implemented for efficient 
utilization of bandwidth. Voice data or speech packets are generally considered to be 
transported at a variable bit rate (VBR). As a result, the problem of unbounded jit:er, 
introduced by the networks, often renders the speech unacceptable or even unintelligible. 
It thus becomes essential to offer control mechanisms to obtain distinctive QoS 

20 guarantees. 

Essentially, voice applications can be broadly classified an either interactive or 
unidirectional. Serving dissimilar purposes, these two classes of applications differ in 
playout delay requirements and the tolerances for playout impairment. Interactive voice 
applications are more sensitive to playout delay than playout impairment due to their riial- 
25 time nature. It is therefore acceptable in interactive voice applications to trade seme 
playout impairment for better playout delay. 

Methods of buffering packets at the receiver end have been extensively studied. 
Such prior art methods include I-PoJicy and E-Policy [W.E. Naylor and L. Kleinruck, 
"Stream Traffic Communication in Packet Switched Networks: Destination Buffeiing 
30 Considerations", IEEE Transactions on Communications, Vol. COM-30, No. 12, Dec 
1982; and D.L. Stone and K. Jeffay, "An Empirical Study of Delay Jitter Management 
' Policies", Multimedia System, pp.267-279, Vol. 2, No.6, Jan 1995], However, these 
schemes do not adapt to traffic conditions, such as delay and j'itter, which may vary from 
time to time. Adaptive playout schemes have also been proposed based on an assumption 
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that the level of traffic conditions like delay jitter for the near future can be estimated in 
terms of the observed level in the recent past [D.L. Stone and K. Jeffay, "An Empirical 
Study of Delay Jitter Management Policies", Multimedia System, pp.267-279, Vcl. 2, 
No.6, Jan 1995]. 

5 It is therefore an aspect of an object of the invention to provide a control 

mechanism for improving the utilization of resources and for optimiang service 
performance. 

Summary of the Invention 

1 o According to an aspect of the invention, there is provided an adaptive predictive 

playout scheme, based on a Least Mean Square (LMS) prediction algorithm, for packet 
voice applications- The packets are received and stored in a buffer for playout at a 
constant draining rate PO, where PO is determined by the codec used. 

When the number of packets in the buffer is greater than LO, the arrival interval of 

1 5 the next incoming packet is predicted based the LMS prediction algorithm. If the 

estimated arrival interval for the next packet is smaller than a draining threshold DO, tlien 
the next packet is predicted to arrive at the destination relatively early and thus is predicted 
to be buffered relatively longer than the previously received packets. However, if the 
oldest packet in the buffer is discarded, then the latency (time of packet in the buffer) of 

20 the next packet is expected to be reduced, but without increasing the probability of causing 
a gap as there are a number of packets in buffer queue. 

If, however, the prediction for the next packet arrival interval is greater than the 
drairiing threshold DO, then this next packet is expected to arrive at a time when all of the 
packets have been played out, thus no packets are needed to be discarded. With such a 

25 prediction, the receiver continues to play out the remainiag packets provided that the 

maximum acceptable playout latency is not exceeded. After the playout of the last packet 
of the talkspurt, or in the event that no packet has arrived for some time since the arrival of 
the last packet, the talkspurt playout is finished. The receiver starts or resets to playout the 
next talkspurt. 

30 

Brief Description of the Drawings 

In the accompanying drawings: 
Figure 1 is a time-line diagram illustrating voice source behavior; 



Figure 2 is a block diagram illustrating a linear predictor according to the invention; 
Figure 3 is a block diagram illustrating an adaptive linear predictor according to the 
invention; 

Figure 4 is a flowchart illustrating LMS prediction algorithms according to the invention; 
Figure 5 is a block diagram illustrating an adaptive prediction playout mechanism utilizing 
LMS prediction algorithms of Figure 4; and 

Figure 6 are flowcharts illustrating an adaptive predictive playout scheme in accordant 
with Figure 5. 

Detailed Description of the Preferred Embodiments 

For voice data as shown in Figure 1, during a talkspurt of duration 1/a packets of 
speech are generated at fixed intervals T. During silence periods, no packet:; are 
generated. At the receiver end, the received constant-size voice packets are played ou t at a 
constant bit rate. 

Talkspurts of speech are of relative short duration (1/a). At the receiver end, the 
packet arrival intervals for talkspurts are assumed to be statistically stationary. 
Consequently, a LMS prediction algorithm can be used to predict the packet arrival 
intervals. 

Thus, where x(t) (t-0,I.2..J denotes a series of packet arrival intervals, the 
problem of voice packet arrival interval series prediction involves predicting the value of 

x(t + /) from the known x(j - n + 1), x(t - n + 2) x(t) where x(n) is the most recently 

received packet. "When Z=l, this process is referred to as one-step prediction. The wcl! 
known least mean square (LMS) error linear prediction is based on Wiener-Hopf 
equations, whereby a k-step linear predictor predicts x(n + k) using a linear combination 
of the current and previous values of x(n) . Thus, the j?th-order linear prediction is obt ained 
by the following equation: 



where w(I) are the prediction filter coefficients, for M) 7 IJL,...,2-1. A linear predictor is 
illustrated in Figure 2 where 
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*(") =lx(n) i x(n-}) s ...x(n-p+l)) T 
e{n) = x(n + k)- x(n + k) 



(3.2) 



From equations (3.1) and (3.2), 

c(«) = x(« + A-)-w r x(M) (3.3) 
The optimal linear predictor in the mean square sense is the one that mimmizes the mean 
square error £, where 

5 Z = E{e(n) 2 } (3.4) 

Since £ is a quadratic function, it has a unique minimum. Therefore, the vector w that 
minimizes £ is found by taking the gradient of £ setting it equal to zero, and then solving 
forw 

V| = 0 

10 V£ = VE{e(n) 2 } = -2E{e(n)x{n)) = 0 

Substituting the value for 

V£ = -2E{[x(n + k)- w r x(7i)]x(rc)} = 0 

Then 

E{x(n +*)*(«)} = £{[w r x(«)]x(n)} (3.5) 
15 If x(n) (n=*0,l,2...) is wide-sense stationary, the correlation between x(«)and x(/z-f-£) is 
only a function of k, r x (fc) . 

rM = E{x(n + k)x(n)} (3.6) 



From the left side of equation (3.5), 
20 E{x(n)x(n + k)} = 



-,(* + !) 



-r(*) 



From the right side of equation (3.5), 
E{[yf T x(n)]x{n)} = £{x(*)x(n) r }w r = 



r t (0) r,(\) - /;0>-l) 
n(D r,(0) - 



'-.CP-D ^(^-2) - r,(0) 
where w is the vector of coefficients, R,is a />xp Hermitian Toeplitz matrix of auto- 
correlations, and r(k) is the vector of cross-correlatLoiis between predicted value x(n+k) 
25 and x(n) . 
Thus, 
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(3/7) 



The equations in (3.7) are the Wiener-Hopf equations for linear prediction. For a 
one-step prediction (£=1), the set of linear equations in (3.6) are equivalent to the ;;et of 
5 linear equations used to fit a pth-order autoregressive (AR) process with the exception of a 
minus sign. The solution to the equations in (3.6) requires knowledge of the auto- 
correlation of x(n), and it also assumes that x(n) is wide sense stationary, i.e., the mean, 
variance, and auto-covariance of x(n) do not change with time. It also requires inv-ating 
R x whose size depends on the order of linear predictor p. 

1 o LMS for prediction does not require any prior knowledge of the auto-correlation of 

a sequence. Therefore, it can be used as an on-line algorithm to predict time intervals. A 
signal diagram of an adaptive linear predictor is shown in Figure 3. The prediction 
coefficients yy(n) are time- varying. The errors, {c(n)} are fed back and used to adapt the 
filter coefficients in order to decrease the mean square error. As time progresseu, a p 

15 number of the latest x(n) is captured to predict the value of x(n+l), in the manner of a 
sliding window over a timeline to predict the next value in terms of a few of the latest 
values. 

The steps of a LMS prediction algorithm according to the invention are: start with 
an initial estimate of the filter (prediction) coefficients w(0); and for each new data point, 
20 compute V£, where 



In practice, the statistics are not known and may change with time. Therefore, the 
expectation operator E is replaced with an estimate. The simplest estimate is the one point 
sample average e(n)x(n). The Vdf is then used to update w(n) by taking a step of size 
25 0.5 ju (ju is an adaptation constant for adjusting the prediction errors) in the negative 
gradient. The update equations for the LMS filter coefficients are: 



If x(n) is stationary, w(n) converges to the mean of the optimal solution R x W = r(&) . 
30 The LMS thus converges in the mean if 1 < < 2/l mw , where A^is the maximum 
eigenvalue of R x . 



w(n+l) = w(«)-0.5//V£ 
w(n + 1) = w(nr) + fxe(ti)x(n) 



(3 7b) 
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• According to the invention, a normalized LMS (NLMS) is a modification Xo the 
LMS algorithm where the update equation is: 

w(n + 1) = w(«) + — / ; 2 (3..S) 

where |x(n)f = x(n) r x(w). NLMS has the advantage over LMS of less sensitivity 10 the 
5 step size ju . Using a large fi results in a faster convergence and quicker response to iignal 
changes. However, after convergence, the prediction •parameters have larger fluctuations. 
On the other hand, using a small fi results in a slower convergence, but smaller 
fluctuations after convergence. There is a tradeoff between faster convergence versus 
smaller fluctuations. 

10 A flowchart of LMS prediction according to an embodiment of the invention is 

shown in figure 4, the steps are: step 400, at the start of a talkspurt, n=0, an initial *(n) is 
estimated; step 405, a packet is received and a packet arrival interval x(n) is obtained; step 
410, the next packet arrival interval x(n+l) is predicted or calculated; step 415, another 
packet is received and the next packet arrival interval x(n+l) is obtained; step 420. the 

15 error e(a) is calculated using equation (3.3); step 425, an update coefficient w(n-l) is 
calculated using equation (3.8) where the Normalized LMS prediction algorithm used; 
step 430, the LMS prediction algorithm to calculate x(n) is updated with the parameters 
w(n+l) and x(n+l), and the last interval parameters w(n-p+l) and x(n-p+l) are dropped; 
and step 435, go to step 41 0 until the talkspurt ends. 

20 According to another embodiment as also shown in Figure 4, equation (3.7b) is 

substituted for equation (3.8) in step 425 where the LMS prediction algorithm is usecl. 

An adaptive predictive playout mechanism, based on LMS prediction of Figure 4, 
is shown in Figure 5. It is composed of three components: 1) a smoothing buffer 10. 2) an 
LMS traffic predictor 12, and 3) a CBR (Constant Bit Rate) player 14. The arriving 

25 packets are queued in the smoothing buffer 10. LMS predictor 12 employs an Dnline 
algorithm as shown in Figure 4, using the normalized LMS prediction algorithm, to predict 
the arrival interval of next incoming packet. Based on the predicted packet arrival interval, 
the CBR player 14 derives an adaptive buffer delay by means of discarding the oldest 
packets in the buffer if necessary. 

30 The first few packets of each talkspurt are buffered to smooth the jitter. Then.: Eire 

two conditions for starting playout of packets: current buffer length Q is greater than the 
buffer threshold L0, and queuing time of the oldest packet in buffer B is greater than the 
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maximum acceptable playout latency TO. Whenever either of these two conditions is met, 
the CBR player 14 starts playout of packets at a constant bit rate. 

During the playout of the packets at a constant dmining rate PO, where PO is 
determined by the codec used to encode the talkspurt into the packets, when the number of 
5 packets in the buffer is greater than LO, the arrival interval of the next incoming packet is 
predicted by the LMS predictor 12. If the estimated next arrival interval is smaller than the 
draining threshold DO, then this packet is predicted to arrive at the destination relatively 
early and that this packet is predicted to be buffered relatively longer than the previously 
received packets. If the oldest packet in the buffer is discarded, the latency of the next 

10 incoming packet is expected to be reduced, but without increasing the probability of 
causing a gap as there are a number of packets in buffer queue. If the prediction of trits 
packet arrival interval is greater than the draining threshold DO, then this packet is 
expected to arrive at a time when all the packets have been played out, thus no packet 
needs to be discarded. With such a prediction, the receiver continues to play out the 

15 remaining packets provided that the maximum acceptable playout latency is not exceeded. 
After the playout of the last packet of the talkspurt, or no packet has arrived for some time 
since the arrival of the last packet, the talkspurt playout is finished. The receiver start;; or 
resets to playout the next talkspurt. 

Flowcharts of the operation of the adaptive predictive playout mechanism of 

20 Figure 5 are shown in Figure 6. The parameters B, TO, PO, DO, LO, and Q, for the 
mechanism are also shown. The steps of the operation are: step 600 waiting for a 
talkspurt; step 610, receipt of a new talkspurt; step 620, initial smoothing of the packets of 
the talkspurt, which comprises receiving packets 622 and holding the packets in the 
smoothing buffer 10 until the current buffer length Q reaches the threshold L0 or B (-= 

25 Queuing time of the oldest packet in buffer) is greater than TO (= Maximum acceptable 
playout latency) 624; and playout 640 of the packets in the buffer 10. 

The playout 640 comprises step 642 to playout the oldest packet in buffer 1 0 with a 
constant chaining rate P0 as determined by the codec used to encode the packets; step 644, 
the buffer length Q is checked to determined if the last packet in the buffer 10 has besn 

30 playout and if played out then go to step 600 to wait for the next talkspurt; if not played 
out then go to step 642 to playout the next packet. 

As the packets are being played out, further packets are also being received aid 
added to the buffer 1 0 (step 646). For each received packet, the LMS predictor 1 2 is 
updated accordingly to the normalized LMS prediction algorithm (step 648) and the buffer 



length Q is checked to determine if Q is below the buffer threshold LO (step 650). If Q is 
greater than LO then predict a next incoming packet arrival interval d (step 652). The 
interval d is compared (step 654) with the draining threshold DO to control possible 
flooding where d is not greater than DO, and also to insure that the maximum pi ay out 
latency TO still remains acceptable where B is not greater than TO. If either of the 
conditions in step 654 is not satisfied then discard the oldest packet in the buffer 10 (step 
656). 

Various simulation scenarios have been tested using simulations of the adaptive 
prediction playout scheme of the invention. Without limiting the scope of the invention, 
the results of the estimated probabilistic QoS values (delay, delay jitter, loss and gap 
probabilities) within the range of specified operating parameter values are provided herein 
below. With parameter values specified as follows : 

• Exponential packet arrival with the mean varying between 1 .5 ms and 3ms, 

• Buffer threshold LO of 50, 55, 60, 75 or 1 00 packets, 

• Maximum acceptable playout latency TO of 1 50ms, 

• Draining threshold DO of 6ms, 

• Constant packet draining rate P0 of 1 packet every 1 .5 ms or 3 ms, 

• Packet length of 1 024 bits, 

• Prediction step size// of 0-05, and 

• Sliding window size p of 1, 3, 5 or 1 0; 
it was observed that, 

• As the sliding window size increased, the packet gap or lost probabilities (varying 
between 0.3% and 1.1%) increased, with the most drastic deterioration occurring 
when the window size jumped from 1 to 3; increasing the buffer threshold decreased 
the values of gap or lost probabilities, which are annoying to voice users when hey 
are too high; 

• As the buffer threshold increased, the mean of queuing delay (varying between 80 
ms and 148 ms) also increased proportionally; decreasing window size improved the 
delay with a very strong improvement occurring when window size was reduced 
from 3 to 1 ; and 

• Delay jitter statistics were not collected in the initial experimentation due to the fact 
that their impact on voice QoS is accounted for by the packet lost or gap 
probabilities, and the packet draining rate (which was constant). 
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Although preferred embodiments of the invention have been described herein, it 
will be understood by those skilled in the art that variations may be made thereto without 
departing from the scope of the invention or the scope of the appended claims. 



